Predicting religion using neural network

By Aleksandar Petrovic, Faculty of Organization Sciences, University of Belgrade

In this experiment it will be shown how neural networks and Neuroph Studio are used when it comes to problems of classification. Several architectures will be tried out, and it will be determined which ones represent a good solution to the problem, and which ones do not.

Classification is a task that is often encountered in every day life. A classification process involves assigning objects into predefined groups or classes based on a number of observed attributes related to those objects. Although there are some more traditional tools for classification, such as certain statistical procedures, neural networks have shown to be an effective solution for this type of problems. There are a number of advantages to using neural networks - they are data driven, they are self-adaptive, they can approximate any function - linear as well as non-linear (which is quite important in this case because groups often cannot be divided by linear functions). Neural networks classify objects rather simply - they take data as input, derive rules based on those data, and make decisions.

The objective of this problem is to create and train neural network to predict the religion of European countries, given some attributes as input. First we need data set. The data that we use in this experiment can be found at Europe. Data that are collected refere to 49 European countries. Each country has 26 input attributes and 1 output attribute.

If we want to use this data set for classification, we need to normalize it. Type of neural network that will be used is multilayer perceptron with backpropagation.

In all input attributes we have 3 types of data (Integer, Boolean, Classification). While the integer and classification types of data have to be normalized, boolean data type don't have to be normalized because their value is already in the 0-1 interval (it is 0 or 1). The boolean attributes are from 8-15, 17-19, 21-24.

Integer values are going to be normailized using L infinitely metricum (every value divided by MAX value), integer attributes are 2, 3, 5 , 6, 7, 20; while the classification values(1, 4, 16, 25, 26) are going to be presented using bytes. Every byte is going to show the existence of certain classification attribute.

Normalized values are saved in ReligionResults.txt file because they will be used for training and testing neural network

After normalizing all data we can start with Neuroph Studio. First we will create new Neuroph project.

The project will be named ’PredictReligion’. After we have clicked 'Finish', a new project is created and it will appear in the 'Projects' window, in the top left corner of Neuroph Studio.

In order to neural network learn the problem we need traaining data set. The training data set consists of input signals assigned with corresponding target (desired output). The neural network is then trained using one of the supervised learning algorithms, which uses the data to adjust the network's weights and thresholds so as to minimize the error in its predictions on the training set. If the network is properly trained, it has then learned to model the (unknown) function that relates the input variables to the output variables, and can subsequently be used to make predictions where the output is not known.

Select training set file type and click 'Next'. After that, enter training set name. Select the type of supervised.

In general, if you use a neural network, you will not know the exact nature of the relationship between inputs and outputs – if you knew the relationship, you would model it directly. The other key feature of neural networks is that they learn input/output relationship through training. There are two types of training used in neural networks, with different types of networks using different types of training. These are supervised and unsupervised training, of which supervised is the most common. In supervised learning, the network user assembles a set of training data. The training data contains examples of inputs together with the corresponding outputs, and the network learns to infer the relationship between the two. In other words, supervised learning is used for classification. For an unsupervised learning rule, the training set consists of input training patterns only. Unsupervised learning, on other hand, is used for clustering.

Our, normalized, data set, that we create above, consists input and output values. Therefore we choose supervised learning. In field Number of inputs enter 54 and in field number of outputs enter 5 and click 'Next'.

Training set can be created in two ways. You can either create training set by entering elements as input and desired output values of neurons in input and output label, or you can create training set by choosing an option load file. The first method of data entry is time consuming, and there is also a risk to make a mistake when entering data. Since we already have training set we will choose second way.

Click on Choose File and find file named ReligionResults.txt. Then select tab as values separator. In our case values have been separated with tab. In some other case values of data set can be separated on the other way. When finished, click on 'Load'.

Now we need to create neural network. In this experiment we will analyze several architecture. Each neural network which we create will be type of Multi Layer Perceptron and each will differ from one another according to parameters of Multi Layer Perceptron.

This is perhaps the most popular network architecture in use today: the units each perform a biased weighted sum of their inputs and pass this activation level through a transfer function to produce their output, and the units are arranged in a layered feedforward topology. The network thus has a simple interpretation as a form of input-output model, with the weights and thresholds (biases) the free parameters of the model. Such networks can model functions of almost arbitrary complexity, with the number of layers, and the number of units in each layer, determining the function complexity.

To create Multi Layer Perceptron network click File -> New File and select desired project from Project drop-down menu, Neural Network file type as you see in picture below.

In new Multi Layer Perceptron dialog enter number of neurons. The number of input and output units is defined by the problem, so you need to enter 54 as number of input neurons and 5 as number of output neurons.

The number of hidden units to use is far from clear. If too many hidden neurons are used, the network will be unable to model complex data, resulting in a poor fit. If too few hidden neurons are used, then training will become excessively long and the network may overfit.

How about the number of hidden layers? For most problems, one hidden layer is normally sufficient. Therefore, we will choose one hidden layer. The goal is try to quickly find the smallest network that converges and then refine the answer by working back from there. Because of that, we will start with 3 hidden neurons and if the network fails to converge after reasonable period, we will restart training up to ten times, thus ensuring that it has not fallen into local minimum. If the network still fails to converge we will add another hidden neuron and repeat procedure.

Further, we check option 'Use Bias Neuron'. Bias neurons are added to neural networks to help them learn patterns. A bias neuron is nothing more than a neuron that has a constant output of 1. Because the bias neurons have a constant output of one they are not connected to the previous layer. The value of 1, which is called the bias activation, can be set to values other than 1. However, 1 is the most common bias activation.

If your values in the data set are in the interval between -1 and 1, choose Tanh transfer function. In our data set, values are in the interval between 0 and 1, so we used Sigmoid transfer function.

As learning rule choose Backpropagation With Momentum. Backpropagation With Momentum algorithm shows a much higher rate of convergence than the Backpropagation algorithm. Choose Dynamic Backpropagation algorithm if you have to training dynamic neural network, which contain both feedforward and feedback connections between the neural layers.

If you want to see neural network as a graph, just select 'Graph View'. Right nodes in first and second level are bias neurons that we explained above.

If we choose 'Block View' and look at the top left corner of View screen we will see that training set is empty. To traing Neural Network we need to put training data in that corner. To do that we will just click on training set that we created and click 'Train'. A new window will open, where we need to set the learning parameters, learning rate and momentum.

Next thing we should do is determine the values of learning parameters, learning rate and momentum.

Learning rate is one of the parameters which governs how fast a neural network learns and how effective the training is. Let us assume that the weight of some synapse in the partially trained network is 0.2. When the network is introduced with a new training sample, the training algorithm demands the synapse to change its weight to 0.7 (say) so that it can learn the new sample appropriately. If we update the weight straightaway, the neural network will definitely learn the new sample, but it tends to forget all the samples it had learnt previously. This is because the current weight (0.2) is a result of all the learning that it has undergone so far. So we do not directly change the weight to 0.7. Instead, we increase it by a fraction (say 25%) of the required change. So, the weight of the synapse gets changed to 0.3 and we move on to the next training sample. Proceeding this way, all the training samples are trained in some random order. Learning rate is a value ranging from zero to unity. Choosing a value very close to zero, requires a large number of training cycles. This makes the training process extremely slow. On the other hand, if the learning rate is very large, the weights diverge and the objective error function heavily oscillates and the network reaches a state where no useful training takes place.

The momentum parameter is used to prevent the system from converging to a local minimum or saddle point. A high momentum parameter can also help to increase the speed of convergence of the system. However, setting the momentum parameter too high can create a risk of overshooting the minimum, which can cause the system to become unstable. A momentum coefficient that is too low cannot reliably avoid local minima, and can also slow down the training of the system.

There are two stopping criteria. One is maximum error and second one is maximum number of learning iterations, which are intuitively clear.

We can see in pictures below that training was succesfull. After 319 iterations Neural Network succeeded to learn problem with error less than 0,01. We can test this network because the error is less than expected.

After the network is trained, we click 'Test', in order to see the total error, and all the individual errors. The result show us that total mean square error is aproximatly 0.0032, which is very good. Individual error are also good but we have here some extreme values that we must avoid. Lets look at results. Values of output show that the network recognize some of resultes with probability approximately 60%. With this information we can conclude that this Neural Network is not very good.

The picture shows only the part of the test but the rest of the results show pretty much the same.

Now we will try to decrease error and number of iteratons. We will update the weight of learning rate and increase it by 25%. In network window click Randomize button and then click Train button. That means that we will set value of 0.2 in learning rate label replace with a new value 0.3 and click 'Train' button.

In the table below for the next three sessions we will present the results of other trainings for these architecture. For other trainings is not given graphic.

Based on data from Table 1 can be seen that regardless of the parameters of training error is always below a specified level, even if we train the network through a different number of iterations. With the minimal number of hidden neurones(3 neurones) we succeeded in getting the result that has error less than desired.

Now we need to examine all the individual errors for every single instance and check if there are any extreme values. When you have a large data set, individual testing requires a lot of time. Instead of testing 54 observations we will random choose 5 observations which will be subjected to individual testing. Three following table will show the value of input, output and errors in 5 randomly selected observations. These values are taken from the window Test Results.

In introduction we mentioned that result can belong to one of five groups. So if is the country catolic, output would be 1, 0, 0, 0, 0, if is protestant it would be 0, 1, 0, 0, 0, if is muslim it would be 0, 0, 1, 0, 0, if is orthodox it would be 0, 0, 0, 1, 0, and finally if the religion of the country is luthern, output would be 0, 0, 0, 0, 1. After completion of testing would be ideal if the value of output after the test were the same as the output values before testing. As with other statistical methods, and classification using neural networks include errors that arise during the approximation. Individual error between the original and the assessed values are shown in Table 4.2.

For observation 49 we can say that there is reasonable mistake in classification. Therefor, we will continue training neural network by increasing learning rate to 0.5 and putting value of momentum 0.6.

At the beginning we said that the goal is try to quickly find the smallest network that converges and then refine the answer by working back from there. Since we find the smallest neural network do the following:

After 4 iterations total net error is 0.0028 and total mean square error is 0.00002. Values of errors of observations are given in table 4.3.

For observation 49 we can say that there is reasonable mistake in classification again. Because this network doesn't learned data perfectly, we will continue training neural network by decreasing learning rate to 0.3 and keeping value of momentum 0.6.

After 2 iterations total net error is very small, almost zero like the total mean square error. But what is the most interesting are the values of errors of observations. They are given in table 4.4.

Because this network learned data perfectly individual error will be equals to zero, except one but that is not a problem because it is very, very small, as we see in table 4.5.

Next Neural Network will have same number of input and output neurons but different number of neurons in hidden layer. We will use 5 hidden layer neurons. Network in named Religion2.

First training course, of second architecture, we will start with same values of learning rate and momentum like in TrainAttempt 1. First click on button 'Train'. In 'Set Learning parameters' dialog, field 'set Stopping criteria enter 0.01 as max error. In order to graphically display, the training of this network, was clearer. In field 'set Learning parameters', enter 0.2 for 'Learning rate' and 0.7 for 'Momentum'. After entering this values click on button 'Train'.

During the testing we successfully trained the neural network named Religion2. The summary of the results are shown in the Table 2.

Like in last attempt we will use the same values of parameters only we are now going to set the value of learning rate on 0.3.

During the testing we successfully trained the neural network named Religion2 white smaller error. The summary of the results are shown in the Table 2.

In previous two attempts we used differente values of learning parameters, so and this time we will use recommended values and we gone set learning rate on 0.5.

Following useful conclusion can be drawn from this training. We can see that the architecture of 5 hidden neurons is appropriate for this training set, because for continuing the training of the neural network we get the desired approximation of max error.

In the table below for the previous three sessions we will present the results of all trainings for the second architecture.

After several tries with different architecture and parameters we got results that are given in table 3. There is interesting pattern in data. If we look number of hidden neurons and total net eror we can see that higher number of neurons leads us to lesser total net error.

Recommendation: If you do not get the desired results, continue to gradually increase the training parameters. The neural network will definitely learn the new sample, and it would not forget all the samples it had learnt previously.

Advanced Training Techniques

When the training is complete, you will want to check the network performance. A learning neural network is expected to extract rules from a finite set of examples. It is often the case that the neural network memorizes the training data well, but fails to generate correct output for some of the new test data. Therefore, it is desirable to come up with some form of regularization.

One form of regularization is to split the training set into a new training set and a validation set. After each step through the new training set, the neural network is evaluated on the validation set. The network with the best performance on the validation set is then used for actual testing. Your new training set should consist of 80% - 90% of the original training set, and the remaining 10% - 20% would be classified in the validation set. Then you have to compute the validation error rate periodically during training and stop training when the validation error rate starts to go up. However, validation error is not a good estimate of the generalization error, if your initial set consists of a relatively small number of instances. Our initial set, we named it Religion, consists of only 49 instances. In this case 10% or 20%, of the original training set, consisted of the 5 or 10 instances. This is the insufficient number of instances to perform validation. In this case instead validation we will use a generalization as a form of regularization.

One way to get appropriate estimate of the generalization error is to run the neural network on the test set of data that is not used at all during the training process. The generalization error is usually defined as the expected value of the square of the difference between the learned function and the exact target.

In the following examples we will check the generalization error, such as from the example to the example we will increase the number of instances in the training set, which we use for training, and we will decrease the number of instances in the sets that we used for testing.

We will choose random 70% of instances of training set for training and remaining 30% for testing. First group will be called Religion70, and second Religion30.

Unlike previous training, now there is no need to create new neural network. Advanced Training Techniques consist in the fact that we examine the performance of existing architectures, using a new training and test set of data. Satisfactory results we found using architecture Religion1. By the end of this article we will use not only this architecture, but also the parameters of the training that we used in this architecture previously which brought us desired results. But before you open an existing architecture, create new training sets. First training set name it Religion70 and second one name it Religion30.

Now open neural network Religion1, select training set Religion70 and in new network window press button 'Train'. The parameters that we now need to set will be the same as the ones in previous training attempt: the maximum error will be 0.01, the Learning rate 0.3, and the Momentum 0.6. We will not limit the maximum number of iterations, and we will check 'Display error graph', as we want the see how the error changes throughout the iteration sequence. Then press 'Train' button again and see what will happen.

Although, problem contained fewer instances it took only 2 iterations to train this network. Because it managed to converge to total net error of 0.01 we can declare this training succesfull.

After successful training the neural network, we can test the same to discover wheter the results will be as good as the previous testing.

Unlike previous practice, where we have to train and test neural networks using the same training set, now we will use the second training set, named Religion30, to test network in which there are data that a neural network has not been seen.

So go to network window, select training set Religion30 and press button 'Test'.

Based on data that we get from the test that collected samples from 30% of the primary test data, we can make a conclusion that the neuron network has successfully learned data. For the test sample “Religion30” after trained network that we get by training the best architecture for training data “Religion70” network shows that it has flawlessly learned data (Individual error of test data is zero as we can see on a picture). That can also be noticed by looking the output values that are 0 or 1, what indicates that for the input values network can guarantee that it has recognized religion of the country for which it has the input values.

Because the neuron network has successfully tested 30% collected samples, we can make a conclusion that it is unnecessary to train that same network for samples that are smaller, because using the logic it would be successfully recognized with the 100% of sureness. Based on this we can say that using this network we can generalized the problem.

Now we will try something different, instead of generalization we will use validation. We trained network with 70% of data from primary data set (generalization) and we have made a conclusion that the network successfully learned data. We also proved that by testing 30% of data that have left. Now we are going to do the opposite, meaning we are going to train that same network with 10% and 20% of data from primary data set, and the rest we’ll use for testing (validation). Training set will be named Religion20 and Religion30, and the test set will be named Religion80 and Religion90.

Based on data that we are going to get from testing, we will see is the network perfect (is it possible to use it for further predictions and to say with certainty that this architecture will be able to learn new data without error and with same parameters). If the individual errors after testing are 0, and if the output values are 0 and 1 we can say that the validation is successful and that the neural network is perfect.

Training attempt	Training set	Testing set	Iterations	Total Net Error (during training)	Total Mean Square Error (during testing)
13.	20%	80%	101	0.00992	0.1722
14.	10%	90%	220	0.00997	0.1861

After training and testing we see in table that total mean square error has very large value what makes the individual errors above limit. We can say that this network failed to validate this problem.

During this experiment, we have built the smallest architecture(whit 3 hidden neurons) that provides most desirable results. We also made one basic training set and three other training sets based on it(10%, 20% and 70% of based training set), but we used only one to get a conclusion, and thet is the arhitekture whit tree hidden neurons and parameters of training like learning rate(0.3) and momentum (0.6). First we needed to normalize the original data using L infinity metricum. Through six steps we explained creation, training and testing of the neuron network. In the end we manage to test our data using most simple network, meaning our network is successfully finished what was required. Five different solutions tested in this experiment have shown that the choice of the number of hidden neurons is very important for the effectiveness of a neural network, with increasing number of neurons we can see that numer of iterations are reducing. We have concluded that one layer of hidden neurons is enough in this case.

Below is a table that summarizes this experiment. The best solution for the problem is marked in the table.

Training attempt 1

Training attempt 2

Training attempt 5

Training attempt 6

Training attempt 7

Advanced Training Techniques

Training attempt 12